当人类的学习者比平常更慢或更快时,人类的学习者可以轻松理解语音或旋律。尽管深度卷积神经网络(CNN)在提取时间序列的信息方面非常有力,但它们需要明确的培训才能推广到不同的时间尺度。本文提出了一个深入的CNN,其中包含了受神经科学最新发现启发的时间表示。在哺乳动物的大脑中,时间由具有时间接受场的神经元群体表示。至关重要的是,接收场的峰形成了几何序列,因此人口在日志时间内代码一组时间基础函数。因为最近的记忆是对数时间的函数,因此重新缩放输入导致内存翻译。比例不变的时间历史卷积网络(SITHCON)在该对数分布的时间内存上构建了卷积层。 Max-Pool操作导致一个网络,该网络是时间模量边缘效应的重新缩放。我们将SITHCON的性能与时间卷积网络(TCN)进行比较。尽管两个网络都可以在单变量和多变量时间序列F(t)上学习分类和回归问题,但仅Sithcon概括为recalings f(at)。这一属性受到当代神经科学的发现的启发,并且与认知心理学的发现一致,可以使网络能够以更少的培训示例,减少体重更少,并且更强大地从样本数据中概括。
translated by 谷歌翻译
Modern telecom systems are monitored with performance and system logs from multiple application layers and components. Detecting anomalous events from these logs is key to identify security breaches, resource over-utilization, critical/fatal errors, etc. Current supervised log anomaly detection frameworks tend to perform poorly on new types or signatures of anomalies with few or unseen samples in the training data. In this work, we propose a meta-learning-based log anomaly detection framework (LogAnMeta) for detecting anomalies from sequence of log events with few samples. LoganMeta train a hybrid few-shot classifier in an episodic manner. The experimental results demonstrate the efficacy of our proposed method
translated by 谷歌翻译
Egocentric 3D human pose estimation with a single head-mounted fisheye camera has recently attracted attention due to its numerous applications in virtual and augmented reality. Existing methods still struggle in challenging poses where the human body is highly occluded or is closely interacting with the scene. To address this issue, we propose a scene-aware egocentric pose estimation method that guides the prediction of the egocentric pose with scene constraints. To this end, we propose an egocentric depth estimation network to predict the scene depth map from a wide-view egocentric fisheye camera while mitigating the occlusion of the human body with a depth-inpainting network. Next, we propose a scene-aware pose estimation network that projects the 2D image features and estimated depth map of the scene into a voxel space and regresses the 3D pose with a V2V network. The voxel-based feature representation provides the direct geometric connection between 2D image features and scene geometry, and further facilitates the V2V network to constrain the predicted pose based on the estimated scene geometry. To enable the training of the aforementioned networks, we also generated a synthetic dataset, called EgoGTA, and an in-the-wild dataset based on EgoPW, called EgoPW-Scene. The experimental results of our new evaluation sequences show that the predicted 3D egocentric poses are accurate and physically plausible in terms of human-scene interaction, demonstrating that our method outperforms the state-of-the-art methods both quantitatively and qualitatively.
translated by 谷歌翻译
The coverage of different stakeholders mentioned in the news articles significantly impacts the slant or polarity detection of the concerned news publishers. For instance, the pro-government media outlets would give more coverage to the government stakeholders to increase their accessibility to the news audiences. In contrast, the anti-government news agencies would focus more on the views of the opponent stakeholders to inform the readers about the shortcomings of government policies. In this paper, we address the problem of stakeholder extraction from news articles and thereby determine the inherent bias present in news reporting. Identifying potential stakeholders in multi-topic news scenarios is challenging because each news topic has different stakeholders. The research presented in this paper utilizes both contextual information and external knowledge to identify the topic-specific stakeholders from news articles. We also apply a sequential incremental clustering algorithm to group the entities with similar stakeholder types. We carried out all our experiments on news articles on four Indian government policies published by numerous national and international news agencies. We also further generalize our system, and the experimental results show that the proposed model can be extended to other news topics.
translated by 谷歌翻译
Indian e-commerce industry has evolved over the last decade and is expected to grow over the next few years. The focus has now shifted to turnaround time (TAT) due to the emergence of many third-party logistics providers and higher customer expectations. The key consideration for delivery providers is to balance their overall operating costs while meeting the promised TAT to their customers. E-commerce delivery partners operate through a network of facilities whose strategic locations help to run the operations efficiently. In this work, we identify the locations of hubs throughout the country and their corresponding mapping with the distribution centers. The objective is to minimize the total network costs with TAT adherence. We use Genetic Algorithm and leverage business constraints to reduce the solution search space and hence the solution time. The results indicate an improvement of 9.73% in TAT compliance compared with the current scenario.
translated by 谷歌翻译
Many problems can be viewed as forms of geospatial search aided by aerial imagery, with examples ranging from detecting poaching activity to human trafficking. We model this class of problems in a visual active search (VAS) framework, which takes as input an image of a broad area, and aims to identify as many examples of a target object as possible. It does this through a limited sequence of queries, each of which verifies whether an example is present in a given region. We propose a reinforcement learning approach for VAS that leverages a collection of fully annotated search tasks as training data to learn a search policy, and combines features of the input image with a natural representation of active search state. Additionally, we propose domain adaptation techniques to improve the policy at decision time when training data is not fully reflective of the test-time distribution of VAS tasks. Through extensive experiments on several satellite imagery datasets, we show that the proposed approach significantly outperforms several strong baselines. Code and data will be made public.
translated by 谷歌翻译
We present XKD, a novel self-supervised framework to learn meaningful representations from unlabelled video clips. XKD is trained with two pseudo tasks. First, masked data reconstruction is performed to learn modality-specific representations. Next, self-supervised cross-modal knowledge distillation is performed between the two modalities through teacher-student setups to learn complementary information. To identify the most effective information to transfer and also to tackle the domain gap between audio and visual modalities which could hinder knowledge transfer, we introduce a domain alignment strategy for effective cross-modal distillation. Lastly, to develop a general-purpose solution capable of handling both audio and visual streams, a modality-agnostic variant of our proposed framework is introduced, which uses the same backbone for both audio and visual modalities. Our proposed cross-modal knowledge distillation improves linear evaluation top-1 accuracy of video action classification by 8.4% on UCF101, 8.1% on HMDB51, 13.8% on Kinetics-Sound, and 14.2% on Kinetics400. Additionally, our modality-agnostic variant shows promising results in developing a general-purpose network capable of handling different data streams. The code is released on the project website.
translated by 谷歌翻译
The vision community has explored numerous pose guided human editing methods due to their extensive practical applications. Most of these methods still use an image-to-image formulation in which a single image is given as input to produce an edited image as output. However, the problem is ill-defined in cases when the target pose is significantly different from the input pose. Existing methods then resort to in-painting or style transfer to handle occlusions and preserve content. In this paper, we explore the utilization of multiple views to minimize the issue of missing information and generate an accurate representation of the underlying human model. To fuse the knowledge from multiple viewpoints, we design a selector network that takes the pose keypoints and texture from images and generates an interpretable per-pixel selection map. After that, the encodings from a separate network (trained on a single image human reposing task) are merged in the latent space. This enables us to generate accurate, precise, and visually coherent images for different editing tasks. We show the application of our network on 2 newly proposed tasks - Multi-view human reposing, and Mix-and-match human image generation. Additionally, we study the limitations of single-view editing and scenarios in which multi-view provides a much better alternative.
translated by 谷歌翻译
The use of multilingual language models for tasks in low and high-resource languages has been a success story in deep learning. In recent times, Arabic has been receiving widespread attention on account of its dialectal variance. While prior research studies have tried to adapt these multilingual models for dialectal variants of Arabic, it still remains a challenging problem owing to the lack of sufficient monolingual dialectal data and parallel translation data of such dialectal variants. It remains an open problem on whether the limited dialectical data can be used to improve the models trained in Arabic on its dialectal variants. First, we show that multilingual-BERT (mBERT) incrementally pretrained on Arabic monolingual data takes less training time and yields comparable accuracy when compared to our custom monolingual Arabic model and beat existing models (by an avg metric of +$6.41$). We then explore two continual pre-training methods-- (1) using small amounts of dialectical data for continual finetuning and (2) parallel Arabic to English data and a Translation Language Modeling loss function. We show that both approaches help improve performance on dialectal classification tasks ($+4.64$ avg. gain) when used on monolingual models.
translated by 谷歌翻译
Neural network-based approaches for solving partial differential equations (PDEs) have recently received special attention. However, the large majority of neural PDE solvers only apply to rectilinear domains, and do not systematically address the imposition of Dirichlet/Neumann boundary conditions over irregular domain boundaries. In this paper, we present a framework to neurally solve partial differential equations over domains with irregularly shaped (non-rectilinear) geometric boundaries. Our network takes in the shape of the domain as an input (represented using an unstructured point cloud, or any other parametric representation such as Non-Uniform Rational B-Splines) and is able to generalize to novel (unseen) irregular domains; the key technical ingredient to realizing this model is a novel approach for identifying the interior and exterior of the computational grid in a differentiable manner. We also perform a careful error analysis which reveals theoretical insights into several sources of error incurred in the model-building process. Finally, we showcase a wide variety of applications, along with favorable comparisons with ground truth solutions.
translated by 谷歌翻译